This sample consists of data for 150 subjects of the original sample of 522 that has completed the initial battery of 37 cognitive tasks, 23 surveys and 3 surveys on demographics. Details of the original sample as well as quality control (qc) procedures are described elsewhere. Invited participants were chosen randomly and only subsets of them were invited for a given batch (instead of opening the battery to all qualified subjects) with the intention to avoid a potential oversampling and bias towards “high self regulators”.
workers = read.csv('/Users/zeynepenkavi/Documents/PoldrackLabLocal/Self_Regulation_Ontology/Data/Retest_03-21-2017/Local/User_717570_workers.csv')
workers = workers %>%
group_by(Worker.ID) %>%
mutate(Retest_worker=ifelse(sum(CURRENT.RetestWorker,CURRENT.RetestWorkerB2,CURRENT.RetestWorkerB3,CURRENT.RetestWorkerB4,CURRENT.RetestWorkerB5,na.rm=T)>0,1,0)) %>%
ungroup()
worker_counts <- fromJSON('/Users/zeynepenkavi/Documents/PoldrackLabLocal/Self_Regulation_Ontology/Data/Retest_03-21-2017/Local/retest_worker_counts.json')
worker_counts = as.data.frame(unlist(worker_counts))
names(worker_counts) = "task_count"
In total 242 participants were invited, 175 began the battery, 157 completed the battery and 150 provided data that passed qc for both time points. Our target sample size was determined in advance of data collection and data collection continued until this number of participants with data that survived qc was reached.
disc_comp_date = read.csv('/Users/zeynepenkavi/Documents/PoldrackLabLocal/Self_Regulation_Ontology/Data/Retest_03-21-2017/Local/discovery_completion_dates.csv', header=FALSE)
val_comp_date = read.csv('/Users/zeynepenkavi/Documents/PoldrackLabLocal/Self_Regulation_Ontology/Data/Retest_03-21-2017/Local/validation_completion_dates.csv', header=FALSE)
test_comp_date = rbind(disc_comp_date, val_comp_date)
rm(disc_comp_date, val_comp_date)
retest_comp_date = read.csv('/Users/zeynepenkavi/Documents/PoldrackLabLocal/Self_Regulation_Ontology/Data/Retest_03-21-2017/Local/retest_completion_dates.csv', header=FALSE)
comp_dates = merge(retest_comp_date, test_comp_date, by="V1")
names(comp_dates) <- c("sub_id", "retest_comp", "test_comp")
comp_dates$retest_comp = as.Date(comp_dates$retest_comp)
comp_dates$test_comp = as.Date(comp_dates$test_comp)
comp_dates$days_btw = with(comp_dates, retest_comp-test_comp)
Data collection took place on average 115 number of days after the completion of the initial battery with a range of 60 to 228 days.
This report uses the variables that were designated as “meaningful” before. This is not the smallest subset where certain variables are dropped because they are correlated with other variables (within a task) and includes both accuracies as well as both kinds of DDM variables (EZ and HDDM). It will not include variables that were considered not of theoretical interest in the analyses of time 1 data. The variables that are not included will be listed for reference as well.
test_data <- read.csv('/Users/zeynepenkavi/Documents/PoldrackLabLocal/Self_Regulation_Ontology/Data/Complete_01-31-2017/meaningful_variables.csv')
test_data2 <- read.csv('/Users/zeynepenkavi/Documents/PoldrackLabLocal/Self_Regulation_Ontology/Data/Complete_01-31-2017/meaningful_variables_noDDM.csv')
test_data3 <- read.csv('/Users/zeynepenkavi/Documents/PoldrackLabLocal/Self_Regulation_Ontology/Data/Complete_01-31-2017/meaningful_variables_EZ.csv')
test_data <- merge(test_data, test_data2)
test_data <- merge(test_data, test_data3)
rm(test_data2, test_data3)
For reference here are the variables that are not included in the analyses of the remainder of this report because they were not of theoretical interest in factor structure analyses of this data so far. These include drift diffusion and other model parameters for specific conditions within a task; survey variables that are not part of the dependant variables for that survey in the literature and demographics (these are saved for prediction analyses).
test_data2 <- read.csv('/Users/zeynepenkavi/Documents/PoldrackLabLocal/Self_Regulation_Ontology/Data/Complete_01-31-2017/variables_exhaustive.csv')
test_data2$X <- as.character(test_data2$X)
names(test_data2)[1] <- 'sub_id'
df <- data.frame(names(test_data2)[which(names(test_data2) %in% names(test_data) == FALSE)])
names(df) = c('vars')
datatable(df)
retest_data <- read.csv('/Users/zeynepenkavi/Documents/PoldrackLabLocal/Self_Regulation_Ontology/Data/Retest_03-21-2017/meaningful_variables.csv')
retest_data2 <- read.csv('/Users/zeynepenkavi/Documents/PoldrackLabLocal/Self_Regulation_Ontology/Data/Retest_03-21-2017/meaningful_variables_noDDM.csv')
retest_data3 <- read.csv('/Users/zeynepenkavi/Documents/PoldrackLabLocal/Self_Regulation_Ontology/Data/Retest_03-21-2017/meaningful_variables_EZ.csv')
retest_data <- merge(retest_data, retest_data2)
retest_data <- merge(retest_data, retest_data3)
rm(retest_data2, retest_data3)
Extract t2 participants with good t1 data: We only invited those with good data but due to a technical error a handful of subjects who weren’t checked for data quality completed the retest battery as well. One of those subjects gave good t2 data but didn’t have good t1 data and therefore is removed from further analyses.
#Process sub_id columns in the data dataframes
retest_data$X <- as.character(retest_data$X)
test_data$X <- as.character(test_data$X)
names(retest_data)[1] <- 'sub_id'
names(test_data)[1] <- 'sub_id'
retest_data <- retest_data[retest_data$sub_id %in% test_data$sub_id,]
The cleaning functions here are translated from the python pipeline to mimick the same procedure that was applied to the meaningful_variables in creating meaningful_variables_clean.
The original cleaning procedures include three steps:
1. Removing outliers for each variable where outliers are defined as those who are greater than 2.5 IQR from median. This step is skipped here because removing outliers from either dataset could lead to an unnecessary increase in missing data. Since we are interested in the average distance between two measures of a metric for a given subject instead of the distance between all these metrics this seemed appropriate.
2. Removing correlated variables within a task. In creating meaningful_variables_clean if variables for a given task correlated >.85 only one was kept. This step is skipped here because we want to look at the reliability of as many variables as we can.
3. Log transforming variables with skew. In cleaning t1 data variables that had a |skew| > 1 were log transformed and those for which skew could not be removed successfully were dropped. Here we transform all variables that have a |skew|>1 in t1 data but variables that are not successfully transformed will be listed and not dropped.
remove_outliers = function(data_column, quantile_range = 2.5){
q_25 = quantile(data_column, na.rm=T)[2]
q_50 = quantile(data_column, na.rm=T)[3]
q_75 = quantile(data_column, na.rm=T)[4]
lowlimit = q_50 - quantile_range*(q_75 - q_25)
highlimit = q_50 + quantile_range*(q_75 - q_25)
data_column = ifelse(data_column<lowlimit, NA, ifelse(data_column>highlimit, NA, data_column))
return(data_column)
}
"%w/o%" <- function(x, y) x[!x %in% y]
pos_log <- function(column){
col_min = min(column, na.rm=T)
a = 1-col_min
column = column+a
return(log(column))
}
neg_log <- function(column){
col_max = max(column, na.rm=T)
column = col_max+1-column
return(log(column))
}
transform_remove_skew = function(data, columns, threshold = 1, drop=FALSE){
tmp = as.data.frame(apply(data[,columns],2,skew))
names(tmp) = c("skew")
tmp$dv = row.names(tmp)
tmp = tmp %>%
filter(abs(skew)>threshold)
skewed_variables = tmp$dv
skew_subset = data[, skewed_variables]
positive_subset = data[,tmp$dv[tmp$skew>0]]
negative_subset = data[,tmp$dv[tmp$skew<0]]
# transform variables
# log transform for positive skew
# positive_subset = log(positive_subset)
# positive_subset = pos_log(positive_subset) #slight divergence from original code
positive_subset = as.data.frame(apply(positive_subset, 2, pos_log))
successful_transforms = as.data.frame(apply(positive_subset, 2, skew))
names(successful_transforms) = c('skew')
successful_transforms$dv = row.names(successful_transforms)
successful_transforms = successful_transforms %>% filter(abs(skew)<threshold)
successful_transforms = successful_transforms$dv
successful_transforms = positive_subset[,successful_transforms]
dropped_vars = names(positive_subset) %w/o% names(successful_transforms)
cat(rep('*', 40))
cat('\n')
cat(paste0(length(names(positive_subset)) ,' data positively skewed data were transformed:'))
cat('\n')
cat(names(positive_subset), sep = '\n')
cat(rep('*', 40))
cat('\n')
# replace transformed variables
data = data[,-c(which(names(data) %in% names(positive_subset)))]
if(drop == TRUE){
names(successful_transforms) = paste0(names(successful_transforms), '.logTr')
cat(rep('*', 40))
cat('\n')
cat(paste0('Dropping ', length(dropped_vars) ,' positively skewed data that could not be transformed successfully:'))
cat('\n')
cat(dropped_vars, sep = '\n')
cat(rep('*', 40))
cat('\n')
data = cbind(data, successful_transforms)
}
else{
names(positive_subset) = paste0(names(positive_subset), '.logTr')
cat(rep('*', 40))
cat('\n')
cat(paste0(length(dropped_vars) ,' positively skewed data could not be transformed successfully:'))
cat('\n')
cat(dropped_vars, sep = '\n')
cat(rep('*', 40))
cat('\n')
data = cbind(data, positive_subset)
}
# reflected log transform for negative skew
negative_subset = as.data.frame(apply(negative_subset, 2, neg_log))
successful_transforms = as.data.frame(apply(negative_subset, 2, skew))
names(successful_transforms) = c('skew')
successful_transforms$dv = row.names(successful_transforms)
successful_transforms = successful_transforms %>% filter(abs(skew)<1)
successful_transforms = successful_transforms$dv
successful_transforms = negative_subset[,successful_transforms]
dropped_vars = names(negative_subset) %w/o% names(successful_transforms)
cat(rep('*', 40))
cat('\n')
cat(paste0(length(names(negative_subset)) ,' data negatively skewed data were transformed:'))
cat('\n')
cat(names(negative_subset), sep = '\n')
cat(rep('*', 40))
cat('\n')
# replace transformed variables
data = data[,-c(which(names(data) %in% names(negative_subset)))]
if(drop == TRUE){
names(successful_transforms) = paste0(names(successful_transforms), '.ReflogTr')
cat(rep('*', 40))
cat('\n')
cat(paste0('Dropping ', length(dropped_vars) ,' negatively skewed data that could not be transformed successfully:'))
cat('\n')
cat(dropped_vars, sep = '\n')
cat(rep('*', 40))
cat('\n')
data = cbind(data, successful_transforms)
}
else{
names(negative_subset) = paste0(names(negative_subset), '.ReflogTr')
cat(rep('*', 40))
cat('\n')
cat(paste0(length(dropped_vars) ,' negatively skewed data could not be transformed successfully:'))
cat('\n')
cat(dropped_vars, sep = '\n')
cat(rep('*', 40))
cat('\n')
data = cbind(data, negative_subset)
}
return(data)
}
Clean t1 data on full sample without dropping any columns. Columns that are not transformed successfully are listed.
numeric_cols = c()
for(i in 1:length(names(test_data))){
if(is.numeric(test_data[,i])){
numeric_cols <- c(numeric_cols, names(test_data)[i])
}
}
# clean_test_data = cbind(sub_id = test_data$sub_id, as.data.frame(apply(test_data[, -which(names(test_data) %in% c("sub_id"))], 2, remove_outliers)))
# clean_test_data = transform_remove_skew(clean_test_data, numeric_cols)
clean_test_data = transform_remove_skew(test_data, numeric_cols)
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
44 data positively skewed data were transformed:
adaptive_n_back.mean_load
bickel_titrator.hyp_discount_rate_large
bickel_titrator.hyp_discount_rate_medium
bickel_titrator.hyp_discount_rate_small
bis11_survey.Motor
columbia_card_task_cold.gain_sensitivity
columbia_card_task_hot.gain_sensitivity
dickman_survey.dysfunctional
dospert_eb_survey.health_safety
impulsive_venture_survey.impulsiveness
kirby.hyp_discount_rate_large
kirby.hyp_discount_rate_medium
kirby.hyp_discount_rate_small
motor_selective_stop_signal.selective_proactive_control
simple_reaction_time.avg_rt
stop_signal.SSRT_high
stop_signal.SSRT_low
tower_of_london.avg_move_time
choice_reaction_time.hddm_thresh
directed_forgetting.proactive_interference_hddm_drift
dot_pattern_expectancy.hddm_thresh
motor_selective_stop_signal.hddm_thresh
shape_matching.hddm_thresh
stim_selective_stop_signal.hddm_thresh
stop_signal.hddm_thresh
stroop.hddm_thresh
threebytwo.hddm_thresh
attention_network_task.conflict_rt
choice_reaction_time.avg_rt
directed_forgetting.proactive_interference_acc
dot_pattern_expectancy.avg_rt
shape_matching.avg_rt
simon.avg_rt
threebytwo.avg_rt
threebytwo.cue_switch_cost_rt_100.0
threebytwo.cue_switch_cost_rt_900.0
adaptive_n_back.EZ_drift
adaptive_n_back.EZ_thresh
attention_network_task.conflict_EZ_non_decision
directed_forgetting.EZ_non_decision
motor_selective_stop_signal.EZ_thresh
stim_selective_stop_signal.EZ_thresh
stroop.EZ_non_decision
threebytwo.EZ_non_decision
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
22 positively skewed data could not be transformed successfully:
bickel_titrator.hyp_discount_rate_large
bickel_titrator.hyp_discount_rate_medium
bickel_titrator.hyp_discount_rate_small
dickman_survey.dysfunctional
impulsive_venture_survey.impulsiveness
kirby.hyp_discount_rate_large
kirby.hyp_discount_rate_medium
kirby.hyp_discount_rate_small
motor_selective_stop_signal.selective_proactive_control
simple_reaction_time.avg_rt
stop_signal.SSRT_high
stop_signal.SSRT_low
tower_of_london.avg_move_time
attention_network_task.conflict_rt
choice_reaction_time.avg_rt
dot_pattern_expectancy.avg_rt
shape_matching.avg_rt
simon.avg_rt
threebytwo.avg_rt
threebytwo.cue_switch_cost_rt_100.0
threebytwo.cue_switch_cost_rt_900.0
attention_network_task.conflict_EZ_non_decision
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
37 data negatively skewed data were transformed:
columbia_card_task_hot.loss_sensitivity
mpq_control_survey.control
selection_optimization_compensation_survey.optimization
ten_item_personality_survey.conscientiousness
attention_network_task.conflict_hddm_drift
attention_network_task.hddm_non_decision
choice_reaction_time.hddm_non_decision
motor_selective_stop_signal.hddm_non_decision
shape_matching.hddm_non_decision
stim_selective_stop_signal.hddm_non_decision
stop_signal.hddm_non_decision
adaptive_n_back.acc
attention_network_task.acc
attention_network_task.conflict_acc
choice_reaction_time.acc
directed_forgetting.acc
dot_pattern_expectancy.AY.BY_acc
dot_pattern_expectancy.BX.BY_acc
dot_pattern_expectancy.BX.BY_rt
dot_pattern_expectancy.acc
go_nogo.acc
information_sampling_task.Fixed_Win_acc
local_global_letter.acc
local_global_letter.conflict_acc
local_global_letter.congruent_facilitation_acc
local_global_letter.incongruent_harm_acc
psychological_refractory_period_two_choices.task1_acc
psychological_refractory_period_two_choices.task2_acc
recent_probes.acc
shape_matching.acc
simon.acc
simon.simon_acc
stroop.acc
stroop.stroop_acc
attention_network_task.conflict_EZ_drift
local_global_letter.conflict_EZ_drift
stim_selective_stop_signal.EZ_non_decision
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
26 negatively skewed data could not be transformed successfully:
columbia_card_task_hot.loss_sensitivity
attention_network_task.hddm_non_decision
choice_reaction_time.hddm_non_decision
motor_selective_stop_signal.hddm_non_decision
shape_matching.hddm_non_decision
stop_signal.hddm_non_decision
adaptive_n_back.acc
attention_network_task.acc
attention_network_task.conflict_acc
choice_reaction_time.acc
directed_forgetting.acc
dot_pattern_expectancy.BX.BY_acc
dot_pattern_expectancy.BX.BY_rt
dot_pattern_expectancy.acc
go_nogo.acc
information_sampling_task.Fixed_Win_acc
local_global_letter.acc
psychological_refractory_period_two_choices.task1_acc
psychological_refractory_period_two_choices.task2_acc
recent_probes.acc
shape_matching.acc
simon.acc
simon.simon_acc
stroop.acc
stroop.stroop_acc
stim_selective_stop_signal.EZ_non_decision
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Instead of applying the same functions applied to t1 data we check which variables are transformed in what way and transform the t2 data the same way. Again we are not removing outliers in the t2 or dropping variables if the skew is not removed successfully after transforming.
Get list of transformed variables and the sign of their skew
logTr_vars = data.frame(grep('logTr', names(clean_test_data), value=T))
names(logTr_vars) = c('var')
logTr_vars$sign = ifelse(grepl(".ReflogTr",logTr_vars$var), 'negative', 'positive')
logTr_vars$var = gsub(".logTr|.ReflogTr", "", logTr_vars$var)
Transform the t2 columns the same way t1 columns were transformed.
clean_retest_data = retest_data
for(i in 1:length(names(clean_retest_data))){
tmp_col = names(clean_retest_data)[i]
if(tmp_col %in% logTr_vars$var){
sign = logTr_vars$sign[which(tmp_col == logTr_vars$var)]
if(sign == 'positive'){
# clean_retest_data[,tmp_col] = log(clean_retest_data[,tmp_col])
clean_retest_data[,tmp_col] = pos_log(clean_retest_data[,tmp_col])
names(clean_retest_data)[which(names(clean_retest_data) == tmp_col)] = paste0(tmp_col, '.logTr')
cat(paste0(tmp_col ,' was positively skewed and transformed.'))
cat('\n')
}
else if(sign == 'negative'){
clean_retest_data[,tmp_col] = neg_log(clean_retest_data[,tmp_col])
names(clean_retest_data)[which(names(clean_retest_data) == tmp_col)] = paste0(tmp_col, '.ReflogTr')
cat(paste0(tmp_col ,' was negatively skewed and transformed.'))
cat('\n')
}
}
}
adaptive_n_back.mean_load was positively skewed and transformed.
bickel_titrator.hyp_discount_rate_large was positively skewed and transformed.
bickel_titrator.hyp_discount_rate_medium was positively skewed and transformed.
bickel_titrator.hyp_discount_rate_small was positively skewed and transformed.
bis11_survey.Motor was positively skewed and transformed.
columbia_card_task_cold.gain_sensitivity was positively skewed and transformed.
columbia_card_task_hot.gain_sensitivity was positively skewed and transformed.
columbia_card_task_hot.loss_sensitivity was negatively skewed and transformed.
dickman_survey.dysfunctional was positively skewed and transformed.
dospert_eb_survey.health_safety was positively skewed and transformed.
impulsive_venture_survey.impulsiveness was positively skewed and transformed.
kirby.hyp_discount_rate_large was positively skewed and transformed.
kirby.hyp_discount_rate_medium was positively skewed and transformed.
kirby.hyp_discount_rate_small was positively skewed and transformed.
motor_selective_stop_signal.selective_proactive_control was positively skewed and transformed.
mpq_control_survey.control was negatively skewed and transformed.
selection_optimization_compensation_survey.optimization was negatively skewed and transformed.
simple_reaction_time.avg_rt was positively skewed and transformed.
stop_signal.SSRT_high was positively skewed and transformed.
stop_signal.SSRT_low was positively skewed and transformed.
ten_item_personality_survey.conscientiousness was negatively skewed and transformed.
tower_of_london.avg_move_time was positively skewed and transformed.
attention_network_task.conflict_hddm_drift was negatively skewed and transformed.
attention_network_task.hddm_non_decision was negatively skewed and transformed.
choice_reaction_time.hddm_non_decision was negatively skewed and transformed.
choice_reaction_time.hddm_thresh was positively skewed and transformed.
directed_forgetting.proactive_interference_hddm_drift was positively skewed and transformed.
dot_pattern_expectancy.hddm_thresh was positively skewed and transformed.
motor_selective_stop_signal.hddm_non_decision was negatively skewed and transformed.
motor_selective_stop_signal.hddm_thresh was positively skewed and transformed.
shape_matching.hddm_non_decision was negatively skewed and transformed.
shape_matching.hddm_thresh was positively skewed and transformed.
stim_selective_stop_signal.hddm_non_decision was negatively skewed and transformed.
stim_selective_stop_signal.hddm_thresh was positively skewed and transformed.
stop_signal.hddm_non_decision was negatively skewed and transformed.
stop_signal.hddm_thresh was positively skewed and transformed.
stroop.hddm_thresh was positively skewed and transformed.
threebytwo.hddm_thresh was positively skewed and transformed.
adaptive_n_back.acc was negatively skewed and transformed.
attention_network_task.acc was negatively skewed and transformed.
attention_network_task.conflict_acc was negatively skewed and transformed.
attention_network_task.conflict_rt was positively skewed and transformed.
choice_reaction_time.acc was negatively skewed and transformed.
choice_reaction_time.avg_rt was positively skewed and transformed.
directed_forgetting.acc was negatively skewed and transformed.
directed_forgetting.proactive_interference_acc was positively skewed and transformed.
dot_pattern_expectancy.AY.BY_acc was negatively skewed and transformed.
dot_pattern_expectancy.BX.BY_acc was negatively skewed and transformed.
dot_pattern_expectancy.BX.BY_rt was negatively skewed and transformed.
dot_pattern_expectancy.acc was negatively skewed and transformed.
dot_pattern_expectancy.avg_rt was positively skewed and transformed.
go_nogo.acc was negatively skewed and transformed.
information_sampling_task.Fixed_Win_acc was negatively skewed and transformed.
local_global_letter.acc was negatively skewed and transformed.
local_global_letter.conflict_acc was negatively skewed and transformed.
local_global_letter.congruent_facilitation_acc was negatively skewed and transformed.
local_global_letter.incongruent_harm_acc was negatively skewed and transformed.
psychological_refractory_period_two_choices.task1_acc was negatively skewed and transformed.
psychological_refractory_period_two_choices.task2_acc was negatively skewed and transformed.
recent_probes.acc was negatively skewed and transformed.
shape_matching.acc was negatively skewed and transformed.
shape_matching.avg_rt was positively skewed and transformed.
simon.acc was negatively skewed and transformed.
simon.avg_rt was positively skewed and transformed.
simon.simon_acc was negatively skewed and transformed.
stroop.acc was negatively skewed and transformed.
stroop.stroop_acc was negatively skewed and transformed.
threebytwo.avg_rt was positively skewed and transformed.
threebytwo.cue_switch_cost_rt_100.0 was positively skewed and transformed.
threebytwo.cue_switch_cost_rt_900.0 was positively skewed and transformed.
adaptive_n_back.EZ_drift was positively skewed and transformed.
adaptive_n_back.EZ_thresh was positively skewed and transformed.
attention_network_task.conflict_EZ_drift was negatively skewed and transformed.
attention_network_task.conflict_EZ_non_decision was positively skewed and transformed.
directed_forgetting.EZ_non_decision was positively skewed and transformed.
local_global_letter.conflict_EZ_drift was negatively skewed and transformed.
motor_selective_stop_signal.EZ_thresh was positively skewed and transformed.
stim_selective_stop_signal.EZ_non_decision was negatively skewed and transformed.
stim_selective_stop_signal.EZ_thresh was positively skewed and transformed.
stroop.EZ_non_decision was positively skewed and transformed.
threebytwo.EZ_non_decision was positively skewed and transformed.
Check transformations are correct
retest_logTr_vars = data.frame(grep('logTr', names(clean_retest_data), value=T))
names(retest_logTr_vars) = c('var')
retest_logTr_vars$sign = ifelse(grepl(".ReflogTr",retest_logTr_vars$var), 'negative', 'positive')
retest_logTr_vars$var = gsub(".logTr|.ReflogTr", "", retest_logTr_vars$var)
retest_logTr_vars = retest_logTr_vars %>% arrange(var)
logTr_vars = logTr_vars %>% arrange(var)
retest_logTr_vars == logTr_vars
var sign
[1,] TRUE TRUE
[2,] TRUE TRUE
[3,] TRUE TRUE
[4,] TRUE TRUE
[5,] TRUE TRUE
[6,] TRUE TRUE
[7,] TRUE TRUE
[8,] TRUE TRUE
[9,] TRUE TRUE
[10,] TRUE TRUE
[11,] TRUE TRUE
[12,] TRUE TRUE
[13,] TRUE TRUE
[14,] TRUE TRUE
[15,] TRUE TRUE
[16,] TRUE TRUE
[17,] TRUE TRUE
[18,] TRUE TRUE
[19,] TRUE TRUE
[20,] TRUE TRUE
[21,] TRUE TRUE
[22,] TRUE TRUE
[23,] TRUE TRUE
[24,] TRUE TRUE
[25,] TRUE TRUE
[26,] TRUE TRUE
[27,] TRUE TRUE
[28,] TRUE TRUE
[29,] TRUE TRUE
[30,] TRUE TRUE
[31,] TRUE TRUE
[32,] TRUE TRUE
[33,] TRUE TRUE
[34,] TRUE TRUE
[35,] TRUE TRUE
[36,] TRUE TRUE
[37,] TRUE TRUE
[38,] TRUE TRUE
[39,] TRUE TRUE
[40,] TRUE TRUE
[41,] TRUE TRUE
[42,] TRUE TRUE
[43,] TRUE TRUE
[44,] TRUE TRUE
[45,] TRUE TRUE
[46,] TRUE TRUE
[47,] TRUE TRUE
[48,] TRUE TRUE
[49,] TRUE TRUE
[50,] TRUE TRUE
[51,] TRUE TRUE
[52,] TRUE TRUE
[53,] TRUE TRUE
[54,] TRUE TRUE
[55,] TRUE TRUE
[56,] TRUE TRUE
[57,] TRUE TRUE
[58,] TRUE TRUE
[59,] TRUE TRUE
[60,] TRUE TRUE
[61,] TRUE TRUE
[62,] TRUE TRUE
[63,] TRUE TRUE
[64,] TRUE TRUE
[65,] TRUE TRUE
[66,] TRUE TRUE
[67,] TRUE TRUE
[68,] TRUE TRUE
[69,] TRUE TRUE
[70,] TRUE TRUE
[71,] TRUE TRUE
[72,] TRUE TRUE
[73,] TRUE TRUE
[74,] TRUE TRUE
[75,] TRUE TRUE
[76,] TRUE TRUE
[77,] TRUE TRUE
[78,] TRUE TRUE
[79,] TRUE TRUE
[80,] TRUE TRUE
[81,] TRUE TRUE
# Switch df names
raw_retest_data = retest_data
raw_test_data = test_data
test_data = clean_test_data
retest_data = clean_retest_data
rm(clean_test_data, clean_retest_data)
retest_subs_test_data <- test_data[test_data$sub_id %in% retest_data$sub_id,]
rest_subs_test_data <- test_data[test_data$sub_id %in% retest_data$sub_id == FALSE,]
#Arrange datasets of same size by sub_id
retest_data = retest_data %>% arrange(sub_id)
retest_subs_test_data = retest_subs_test_data %>% arrange(sub_id)
#CHECK IF EVERYTHING IS ORDERED RIGHT
as.character(retest_subs_test_data$sub_id) == as.character(retest_data$sub_id)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[15] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[29] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[43] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[57] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[71] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[85] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[99] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[113] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[127] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[141] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Since HDDM parameters depend on the sample on which they are fit we refit the model on t1 data for the subjects that have t2 data. Here we replace the HDDM parameters in the current t1 dataset with these refitted values and transform them as they would have been transformed for the full sample to mimick the procedure we applied to rest of the t2 data.
Transform hddm refits as they had been transformed for the full sample t1 data.
hddm_refits <- read.csv('/Users/zeynepenkavi/Documents/PoldrackLabLocal/Self_Regulation_Ontology/Data/Retest_03-21-2017/hddm_refits_exhaustive.csv')
hddm_refits$X <- as.character(hddm_refits$X)
names(hddm_refits)[1] <- 'sub_id'
hddm_logTr_vars = data.frame(grep('(?=.*hddm)(?=.*logTr)', names(retest_subs_test_data), value=T, perl=T))
names(hddm_logTr_vars) = c('var')
hddm_logTr_vars$sign = ifelse(grepl(".ReflogTr",hddm_logTr_vars$var), 'negative', 'positive')
hddm_logTr_vars$var = gsub(".logTr|.ReflogTr", "", hddm_logTr_vars$var)
transformed_hddm_refits = hddm_refits
for(i in 1:length(names(transformed_hddm_refits))){
tmp_col = names(transformed_hddm_refits)[i]
if(tmp_col %in% hddm_logTr_vars$var){
sign = hddm_logTr_vars$sign[which(tmp_col == hddm_logTr_vars$var)]
if(sign == 'positive'){
# clean_retest_data[,tmp_col] = log(clean_retest_data[,tmp_col])
transformed_hddm_refits[,tmp_col] = pos_log(transformed_hddm_refits[,tmp_col])
names(transformed_hddm_refits)[which(names(transformed_hddm_refits) == tmp_col)] = paste0(tmp_col, '.logTr')
cat(paste0(tmp_col ,' was positively skewed and transformed.'))
cat('\n')
}
else if(sign == 'negative'){
transformed_hddm_refits[,tmp_col] = neg_log(transformed_hddm_refits[,tmp_col])
names(transformed_hddm_refits)[which(names(transformed_hddm_refits) == tmp_col)] = paste0(tmp_col, '.ReflogTr')
cat(paste0(tmp_col ,' was negatively skewed and transformed.'))
cat('\n')
}
}
}
attention_network_task.conflict_hddm_drift was negatively skewed and transformed.
attention_network_task.hddm_non_decision was negatively skewed and transformed.
choice_reaction_time.hddm_non_decision was negatively skewed and transformed.
choice_reaction_time.hddm_thresh was positively skewed and transformed.
directed_forgetting.proactive_interference_hddm_drift was positively skewed and transformed.
dot_pattern_expectancy.hddm_thresh was positively skewed and transformed.
motor_selective_stop_signal.hddm_non_decision was negatively skewed and transformed.
motor_selective_stop_signal.hddm_thresh was positively skewed and transformed.
shape_matching.hddm_non_decision was negatively skewed and transformed.
shape_matching.hddm_thresh was positively skewed and transformed.
stim_selective_stop_signal.hddm_non_decision was negatively skewed and transformed.
stim_selective_stop_signal.hddm_thresh was positively skewed and transformed.
stop_signal.hddm_non_decision was negatively skewed and transformed.
stop_signal.hddm_thresh was positively skewed and transformed.
stroop.hddm_thresh was positively skewed and transformed.
threebytwo.hddm_thresh was positively skewed and transformed.
Check if transformation of hddm refits worked
transform_hddm_logTr_vars = data.frame(grep('logTr', names(transformed_hddm_refits), value=T))
names(transform_hddm_logTr_vars) = c('var')
transform_hddm_logTr_vars$sign = ifelse(grepl(".ReflogTr",transform_hddm_logTr_vars$var), 'negative', 'positive')
transform_hddm_logTr_vars$var = gsub(".logTr|.ReflogTr", "", transform_hddm_logTr_vars$var)
transform_hddm_logTr_vars = transform_hddm_logTr_vars %>% arrange(var)
hddm_logTr_vars = hddm_logTr_vars %>% arrange(var)
transform_hddm_logTr_vars == hddm_logTr_vars
var sign
[1,] TRUE TRUE
[2,] TRUE TRUE
[3,] TRUE TRUE
[4,] TRUE TRUE
[5,] TRUE TRUE
[6,] TRUE TRUE
[7,] TRUE TRUE
[8,] TRUE TRUE
[9,] TRUE TRUE
[10,] TRUE TRUE
[11,] TRUE TRUE
[12,] TRUE TRUE
[13,] TRUE TRUE
[14,] TRUE TRUE
[15,] TRUE TRUE
[16,] TRUE TRUE
Replace t1 hddm parameters for retest subjects
replace_hddm_vars = data.frame(grep('(?=.*hddm)(?=.*logTr)', names(retest_subs_test_data), value=T, perl=T))
names(replace_hddm_vars) = c('var')
hddm_fix_retest_subs_test_data = retest_subs_test_data
for(i in 1:length(names(hddm_fix_retest_subs_test_data))){
if(names(hddm_fix_retest_subs_test_data)[i] %in% replace_hddm_vars$var){
hddm_fix_retest_subs_test_data[,names(hddm_fix_retest_subs_test_data)[i]] <- transformed_hddm_refits[,names(hddm_fix_retest_subs_test_data)[i]]
}
}
retest_subs_test_data <- hddm_fix_retest_subs_test_data
rm(hddm_refits, transformed_hddm_refits, hddm_logTr_vars, transform_hddm_logTr_vars, replace_hddm_vars, hddm_fix_retest_subs_test_data, sign, i)
For correlations we report Spearman’s \(\rho\)’s only.
Based on Weir (2005) ICC(3,k) does not take in to account within subject differences between two time points (i.e. the fixed effect of time/systematic error). Thus, it is well approximated by Pearson’s r and subject to similar criticisms. Weir (2005) suggests reporting at least this systematic error effect size if one chooses to report with ICC(3,k). Based on his conclusions here I report:
- ICC(3,k): As Dave clarified this ranges from 1 to -1/(number of repeated measures -1) so in our case this range would be [-1, 1]; larger values would mean that the two scores of a subject for a given measure are more similar to each other than they are to scores of other people
- partial \(\eta^2\) for time (\(SS_{time}/SS_{within}\)): effect size of time
- SEM (\(\sqrt(MS_{error})\)): standard error of measurement; the smaller the better
numeric_cols = c()
for(i in 1:length(names(test_data))){
if(is.numeric(test_data[,i])){
numeric_cols <- c(numeric_cols, names(test_data)[i])
}
}
match_t1_t2 <- function(dv_var, t1_df = retest_subs_test_data, t2_df = retest_data, merge_var = 'sub_id', format = "long", sample = 'full', sample_vec){
if(sample == 'full'){
df = merge(t1_df[,c(merge_var, dv_var)], t2_df[,c(merge_var, dv_var)], by = merge_var)
}
else{
df = merge(t1_df[t1_df[,merge_var] %in% sample_vec, c(merge_var, dv_var)], t2_df[t2_df[,merge_var] %in% sample_vec, c(merge_var, dv_var)],
by=merge_var)
}
df = df %>%
na.omit()%>%
gather(dv, score, -sub_id) %>%
mutate(time = ifelse(grepl('\\.x', dv), 1, ifelse(grepl('\\.y', dv), 2, NA))) %>%
separate(dv, c("dv", "drop"), sep='\\.([^.]*)$') %>%
select(-drop)
if(format == 'wide'){
df = df%>% spread(time, score)
}
return(df)
}
get_spearman = function(dv_var, t1_df = retest_subs_test_data, t2_df = retest_data, merge_var = 'sub_id', sample='full', sample_vec){
if(sample=='full'){
df = match_t1_t2(dv_var, t1_df = t1_df, t2_df = t2_df, merge_var = merge_var, format='wide')
}
else if(sample=='bootstrap'){
df = match_t1_t2(dv_var, t1_df = t1_df, t2_df = t2_df, merge_var = merge_var, format='wide', sample='bootstrap', sample_vec = sample_vec)
}
rho = cor(df$`1`, df$`2`, method='spearman')
return(rho)
}
get_icc <- function(dv_var, t1_df = retest_subs_test_data, t2_df = retest_data, merge_var = 'sub_id', sample='full', sample_vec){
if(sample=='full'){
df = match_t1_t2(dv_var, t1_df = t1_df, t2_df = t2_df, merge_var = merge_var, format='wide')
}
else if(sample=='bootstrap'){
df = match_t1_t2(dv_var, t1_df = t1_df, t2_df = t2_df, merge_var = merge_var, format='wide', sample='bootstrap', sample_vec = sample_vec)
}
df = df %>% select(-dv, -sub_id)
icc = ICC(df)
icc_3k = icc$results['Average_fixed_raters', 'ICC']
return(icc_3k)
}
get_eta <- function(dv_var, t1_df = retest_subs_test_data, t2_df = retest_data, merge_var = 'sub_id', sample='full', sample_vec){
if(sample=='full'){
df = match_t1_t2(dv_var, t1_df = t1_df, t2_df = t2_df, merge_var = merge_var)
}
else if(sample=='bootstrap'){
df = match_t1_t2(dv_var, t1_df = t1_df, t2_df = t2_df, merge_var = merge_var, sample='bootstrap', sample_vec = sample_vec)
}
mod = summary(aov(score~Error(sub_id)+time, df))
ss_time = as.data.frame(unlist(mod$`Error: Within`))['Sum Sq1',]
ss_error = as.data.frame(unlist(mod$`Error: Within`))['Sum Sq2',]
eta = ss_time/(ss_time+ss_error)
return(eta)
}
get_sem <- function(dv_var, t1_df = retest_subs_test_data, t2_df = retest_data, merge_var = 'sub_id', sample='full', sample_vec){
if(sample=='full'){
df = match_t1_t2(dv_var, t1_df = t1_df, t2_df = t2_df, merge_var = merge_var)
}
else if(sample=='bootstrap'){
df = match_t1_t2(dv_var, t1_df = t1_df, t2_df = t2_df, merge_var = merge_var, sample='bootstrap', sample_vec = sample_vec)
}
mod = summary(aov(score~Error(sub_id)+time, df))
ms_error = as.data.frame(unlist(mod$`Error: Within`))['Mean Sq2',]
sem = sqrt(ms_error)
return(sem)
}
rel_df <- data.frame(spearman = rep(NA, length(numeric_cols)),
icc = rep(NA, length(numeric_cols)),
eta_sq = rep(NA, length(numeric_cols)),
sem = rep(NA, length(numeric_cols)))
row.names(rel_df) <- numeric_cols
for(i in 1:length(numeric_cols)){
rel_df[numeric_cols[i], 'spearman'] <- get_spearman(numeric_cols[i])
rel_df[numeric_cols[i], 'icc'] <- get_icc(numeric_cols[i])
rel_df[numeric_cols[i], 'eta_sq'] <- get_eta(numeric_cols[i])
rel_df[numeric_cols[i], 'sem'] <- get_sem(numeric_cols[i])
}
rel_df$dv = row.names(rel_df)
row.names(rel_df) = seq(1:nrow(rel_df))
rel_df$task = 'task'
rel_df[grep('survey', rel_df$dv), 'task'] = 'survey'
rel_df = rel_df %>%
select(dv, task, spearman, icc, eta_sq, sem)
# row.names(rel_df) = NULL
Average reliability metrics and the number of variables for both the surveys and the tasks Both Spearman \(\rho\)’s as well the ICC’s are higher for surveys.
Systematic differences between the two measurements (\(\eta^2\)) are higher for tasks than surveys (potentially indicating learning effects?).
Unexpectedly, median standard error of measurement is higher for surveys but the distribution below suggests that this is a side effect of having too many task measures with very low SEMs.
rel_df %>%
group_by(task) %>%
summarise(median_spearman = median(spearman),
median_icc = median(icc),
median_eta_sq = median(eta_sq),
median_sem = median(sem),
num_vars = n()) %>%
datatable() %>%
formatRound(columns=c('median_spearman', 'median_icc', 'median_eta_sq', 'median_sem'), digits=3)
Rank orders for reliabilities
rel_df %>%
datatable(filter='top') %>%
formatRound(columns=c('spearman', 'icc', 'eta_sq', 'sem'), digits=3)
ggplotly(ggplot_build(rel_df %>%
ggplot(aes(spearman, fill=task)) +
geom_histogram(position = 'identity', alpha=0.5))$data[[1]] %>%
mutate(task=ifelse(as.character(fill) == "#00BFC4", "task", "survey")) %>%
get_labels(.,rel_df, "spearman", "task") %>%
prettify_label(.) %>%
ggplot(aes(x,y, fill=task, label=label))+
geom_bar(stat='identity', position='identity', alpha=0.5)+
xlab('Spearman')+
ylab('Count')+
theme_bw(),
tooltip = 'label',
hoverinfo = 'text')
ggplotly(ggplot_build(rel_df %>%
ggplot(aes(icc, fill=task)) +
geom_histogram(position = 'identity', alpha=0.5))$data[[1]] %>%
mutate(task=ifelse(as.character(fill) == "#00BFC4", "task", "survey")) %>%
get_labels(.,rel_df, "icc", "task") %>%
prettify_label(.) %>%
ggplot(aes(x,y, fill=task, label=label))+
geom_bar(stat='identity', position='identity', alpha=0.5)+
xlab('ICC')+
ylab('Count')+
theme_bw(),
tooltip = 'label',
hoverinfo = 'text')
Tooltip might not work well for this graph for the highest bar because the label is too large. But this bar indicates measures where the effect of time is very small anyway. We might want to pay closer attention to those not in this lowest bar since those are the variables where there is a non-negligible effect of time, indicating either a learning effect or other kind of systematic change that we did not hypothesize to observe with any of these measures.
ggplotly(ggplot_build(rel_df %>%
ggplot(aes(eta_sq, fill=task)) +
geom_histogram(position = 'identity', alpha=0.5))$data[[1]] %>%
mutate(task=ifelse(as.character(fill) == "#00BFC4", "task", "survey")) %>%
get_labels(.,rel_df, "eta_sq", "task") %>%
prettify_label(.) %>%
ggplot(aes(x,y, fill=task, label=label))+
geom_bar(stat='identity', position='identity', alpha=0.5)+
xlab('Partial Eta Squared')+
ylab('Count')+
theme_bw(),
tooltip = 'label',
hoverinfo = 'text')
So what variables show a non-negligible effect of time and in what direction? (sorted by increasing partial \(\eta^2\))
time_effect_vars = ggplot_build(rel_df %>%
ggplot(aes(eta_sq, fill=task)) +
geom_histogram(position = 'identity', alpha=0.5))$data[[1]] %>%
mutate(task=ifelse(as.character(fill) == "#00BFC4", "task", "survey")) %>%
get_labels(.,rel_df, "eta_sq", "task") %>%
prettify_label(.) %>%
filter(bin != 1 & bin !=2 & label != "NA") %>%
select(label, task) %>%
separate(label,into=c(as.character(1:27)),sep = "</br>") %>%
gather(key, value, -task) %>%
select(value, task) %>%
na.omit()
time_effect_vars$value = gsub("\"", "", time_effect_vars$value)
datatable(time_effect_vars)
Box plots showing the distributions of difference scores between the measurements at two time points for each measure can be found under the Completion times section.
ggplotly(ggplot_build(rel_df %>%
ggplot(aes(sem, fill=task)) +
geom_histogram(position = 'identity', alpha=0.5))$data[[1]] %>%
mutate(task=ifelse(as.character(fill) == "#00BFC4", "task", "survey")) %>%
get_labels(.,rel_df, "sem", "task") %>%
prettify_label(.) %>%
ggplot(aes(x,y, fill=task, label=label))+
geom_bar(stat='identity', position='identity', alpha=0.5)+
xlab('SEM')+
ylab('Count')+
theme_bw(),
tooltip = 'label',
hoverinfo = 'text')
neg_icc_vars = rel_df %>%
arrange(icc) %>%
filter(icc<0) %>%
select(dv)
Some variables (attention_network_task.alerting_EZ_thresh, attention_network_task.orienting_acc, recent_probes.proactive_interference_EZ_drift, recent_probes.proactive_interference_acc, recent_probes.proactive_interference_hddm_drift, threebytwo.cue_switch_cost_EZ_non_decision_900.0, two_stage_decision.model_based, threebytwo.task_switch_cost_EZ_non_decision_900.0, attention_network_task.orienting_EZ_thresh, local_global_letter.conflict_EZ_thresh, recent_probes.proactive_interference_EZ_thresh, threebytwo.task_switch_cost_rt_900.0, threebytwo.task_switch_cost_hddm_drift_900.0, local_global_letter.switch_cost_EZ_thresh, threebytwo.task_switch_cost_EZ_drift_900.0, probabilistic_selection.positive_learning_bias, two_stage_decision.perseverance, directed_forgetting.proactive_interference_EZ_drift, dot_pattern_expectancy.BX.BY_EZ_non_decision, probabilistic_selection.value_sensitivity) have <0 ICC’s. This would be the case if the \(MS_{error}\)>\(MS_{between}\).
Plot data for measures with negative ICC.
x-axis is time 1 data and y-axis is time 2 data. The scatter plots show that the scores for measures with negative ICC’s have no relationship across time points.
tmp = data.frame()
for(i in 1:nrow(neg_icc_vars)){
tmp = rbind(tmp, match_t1_t2(neg_icc_vars$dv[i]))
}
rm(neg_icc_vars)
tmp %>%
spread(time, score) %>%
# mutate(dv_wrap = str_wrap(dv, width = 30)) %>%
ggplot(aes(`1`, `2`))+
geom_point()+
theme_bw()+
facet_wrap(~dv, scales='free', labeller = label_wrap_gen())
This point is clarified further by compare these scatter plots to those of variables with highest ICC. These variables have strong and clear positive linear relationships.
high_icc_vars = rel_df %>%
arrange(-icc) %>%
filter(icc>0.92) %>%
select(dv)
tmp = data.frame()
for(i in 1:nrow(high_icc_vars)){
tmp = rbind(tmp, match_t1_t2(high_icc_vars$dv[i]))
}
rm(high_icc_vars)
tmp %>%
spread(time, score) %>%
ggplot(aes(`1`, `2`))+
geom_point()+
theme_bw()+
facet_wrap(~dv, scales='free')
These two measures give mostly similar answers regarding the reliabilities of the measures except for
ggplotly(rel_df %>%
ggplot(aes(spearman, icc, col=task, label=dv, label2=eta_sq))+
geom_point()+
theme_bw()+
theme(legend.title = element_blank())+
geom_abline(intercept = 0, slope=1))
What are the variables clustered around 0 ICC?
rel_df %>%
filter(icc>(-0.1), icc<0.1) %>%
arrange(icc) %>%
datatable() %>%
formatRound(columns=c('spearman', 'icc', 'eta_sq', 'sem'), digits=3)
ggplotly(rel_df %>%
ggplot(aes(eta_sq, icc, col=task, label=dv))+
geom_point()+
theme_bw()+
theme(legend.title = element_blank()))
The results presented so far aim to be comprehensive but are difficult to parse through and get a clear sense of what is going on (other than a general impression that surveys tend to have higher reliabilities than tasks). In this and the following section we try to deal with this in two ways to make the results more digestable with two kinds of categorizations.
We aim to make contact with the literature to make the results more accesible to readers who have experience with our tasks but do not want to
1. take away from the novelty of the main papers coming out of this project outlining the factor structure that this data unveils
2. necessarily make prescriptive recommendations or theoretical arguments on why certain measures of certain processes are more reliable than others.
With this in mind we reduce the list of over 270 task measures to a list of one per task by averaging only the raw measures from all the trials in a task. We chose to reduce the information in this manner to avoid any bias stemming from differential amount of interest and procedures applied to certain tasks over others (e.g. a task can have over 10 measures because it has multiple conditions and we have chosen to fit DDM’s for specific conditions while another might only have 2 due to our relative inexperience and lack of interest in it). We check whether the number of trials in a task has a significant effect on these average reliabilities of raw measures as well.
We plan to supplement/potentially replace this with an additional labeling procedure intending to capture the underlying putative cognitive processes but are deferring this until at least for another week. This exercise might not even prove helpful but at least a per task summary seemed a reasonable first pass.
We filter out the DDM parameters and measures for specific contrasts. Note that this does leave some tasks with measures that are model fits and/or for specific conditions (because at least the current datasets do not include measures that are based on all the trials and are raw though I could dive in to variables_exhaustive for such measures. For example the average relialibility for Kirby is based on three discount rates for specific conditions.). Here’s the order of tasks by mean reliability sorted for ICC and then Spearman’s \(\rho\).
tmp = measure_labels %>%
mutate(dv = as.character(dv)) %>%
filter(task == 'task') %>%
select(-measure_description, -notes, -measure_type, -task, -ddm_task) %>%
left_join(rel_df[,c("dv", "spearman","icc")], by='dv') %>%
separate(variable_type, c("label_1", "label_2", "label_3"), sep = ",") %>%
mutate(label_1 = trim(label_1),
label_2=trim(label_2),
label_3=trim(label_3))%>%
filter(label_1 %in% c('difference', 'EZ', 'hddm') ==FALSE & label_2 %in% c('difference', 'EZ', 'hddm') == FALSE) %>%
separate(dv, c('task_name', 'extra_1', 'extra_2'), sep = '\\.',remove=FALSE) %>%
select(-extra_1, -extra_2) %>%
group_by(task_name) %>%
summarise(median_icc = median(icc),
median_spearman = median(spearman),
num_measures = n(),
num_trials = unique(num_all_trials)) %>%
arrange(-median_icc, -median_spearman)
tmp %>%
datatable() %>%
formatRound(columns=c('median_spearman', 'median_icc'), digits=3)
Does number of items in a task have a significant effect on the average ICC of (mostly) raw measures for all trials from a task? No.
summary(lm(median_icc ~ num_trials, data = tmp))
Call:
lm(formula = median_icc ~ num_trials, data = tmp)
Residuals:
Min 1Q Median 3Q Max
-0.6006 -0.0827 0.0584 0.1244 0.2874
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5864724 0.0483369 12.13 1e-13 ***
num_trials 0.0000915 0.0002455 0.37 0.71
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.203 on 33 degrees of freedom
Multiple R-squared: 0.0042, Adjusted R-squared: -0.026
F-statistic: 0.139 on 1 and 33 DF, p-value: 0.712
Does number of items in a task have a significant effect on the average Spearman \(\rho\) of (mostly) raw measures for all trials from a task? No.
summary(lm(median_spearman ~ num_trials, data = tmp))
Call:
lm(formula = median_spearman ~ num_trials, data = tmp)
Residuals:
Min 1Q Median 3Q Max
-0.5041 -0.0641 0.0109 0.1258 0.3005
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4804367 0.0411301 11.68 2.9e-13 ***
num_trials 0.0000674 0.0002089 0.32 0.75
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.172 on 33 degrees of freedom
Multiple R-squared: 0.00315, Adjusted R-squared: -0.0271
F-statistic: 0.104 on 1 and 33 DF, p-value: 0.749
Does number of measures that went in to this reliability estimate have a significant effect on the average ICC of (mostly) raw measures for all trials from a task? It seems the more measures that goes in to summarizing overall reliability for a task the worse it gets. This would make sense if these variables are intended to measure different things and some are less reliable than others.
summary(lm(median_icc ~ num_measures, data = tmp))
Call:
lm(formula = median_icc ~ num_measures, data = tmp)
Residuals:
Min 1Q Median 3Q Max
-0.5790 -0.0519 0.0453 0.1199 0.2627
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.7331 0.0715 10.3 8.6e-12 ***
num_measures -0.0515 0.0245 -2.1 0.044 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.191 on 33 degrees of freedom
Multiple R-squared: 0.118, Adjusted R-squared: 0.0911
F-statistic: 4.41 on 1 and 33 DF, p-value: 0.0435
Does number of measures that went in to this reliability estimate have a significant effect on the average ICC of (mostly) raw measures for all trials from a task?
The same is true for Spearman’s correlations as well.
summary(lm(median_spearman ~ num_measures, data = tmp))
Call:
lm(formula = median_spearman ~ num_measures, data = tmp)
Residuals:
Min 1Q Median 3Q Max
-0.4358 -0.0854 0.0187 0.1093 0.2627
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.6090 0.0604 10.08 1.3e-11 ***
num_measures -0.0459 0.0207 -2.21 0.034 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.161 on 33 degrees of freedom
Multiple R-squared: 0.129, Adjusted R-squared: 0.103
F-statistic: 4.89 on 1 and 33 DF, p-value: 0.034
In addition to average relilibilities for each task we can look at relilibities of measures organized by the putative cognitive process they are intended to measure. Note however that these labels are not determined in a data driven manner.
measure_labels %>%
mutate(dv = as.character(dv)) %>%
filter(task == 'task') %>%
select(-measure_description, -notes, -variable_type, -task, -ddm_task) %>%
left_join(rel_df[,c("dv", "spearman","icc")], by='dv') %>%
group_by(measure_type) %>%
summarise(median_icc = median(icc),
median_spearman = median(spearman),
num_measures = n()) %>%
arrange(-median_icc, -median_spearman) %>%
datatable() %>%
formatRound(columns=c('median_spearman', 'median_icc'), digits=3)
Here is what the distributions look like.
- Alerting: This labels consists only of ANT alerting contrast (no cue - double cue trials). 4/6 are drift diffusion parameters.
- Cue encoding: This label consists only of three by two task switch cost contrast (stay (where cue and task remains the same) - switch (where task is the same but cue changes) trials). Again most are drift diffusion parameters.
- Delay discounting: This label consists of the discount rates for the Bickel titrator (3), Kirby items (3) and discount titrate (model fit to random set of stimuli). Does the discrepancy between ICC and Spearman for these measures, given the specific interest in these tasks in the literature warrant a more detailed look in to these?
- Information use: This label includes Columbia Card Task (both versions) and information sampling task.
- Inhibition: This label includes go no-go, motor selective stop signal, stimulus selective stop signal and regular stop signal (excluding specific constrasts).
- Intelligence: This label includes ravens and tower of london.
- Learning: This label includes ART, CCT, two stage decision, dietary decision task, hierarchical rule task, information sampling task, probabilistic selection task, shift task, two stage task.
- Neutral writing: This label consists only of the writing task.
- Orienting: This labels consists only of ANT orienting contrast (center - spatial cue trials). 4/6 are drift diffusion parameters.
- Positive writing: This label consists only of the writing task.
- Proactive control: This labels consists of one DPX contrast (AY - BY trials), motor selective stop signal and regular stop signal.
- Reactive control: This labels consists of one DPX contrast (BX - BY trials) and motor selective stop signal.
- Resisting proactive interference: This label consists of one contrast in directed forgetting and recent probes tasks (negative (where the stimulus to respond to is in forgetting set) and control (where the stimulus to respond to is new) trials).
- Response conflict: This label consists of ANT conflict contrast (congruent-incongruent), DPX, local global conflict contrast, shape matching, simon, stroop.
- Response selection: This label consists of choice reaction time task.
- Risk Taking: This label consists of ART and CCT.
- Selective attention: This label consists of local global task and ANT (all trials).
- Set shifting: This label consists of local global, PRT and three by two.
- Speed: This label consists of raw response times in information sampling task, simple reaction time, tower of london.
- Working memory: This label consists of adaptive n back, digit span, directed forgetting, keep track, recent probes, spatial span.
measure_labels %>%
mutate(dv = as.character(dv)) %>%
filter(task == 'task') %>%
select(-measure_description, -notes, -variable_type, -task, -ddm_task) %>%
left_join(rel_df[,c("dv", "spearman","icc")], by='dv') %>%
select(measure_type, icc, spearman) %>%
gather(key, value, -measure_type) %>%
ggplot(aes(value, fill=key))+
geom_histogram(position='identity', alpha=0.5)+
theme_bw()+
facet_wrap(~measure_type)+
theme(legend.title = element_blank())
To be filled with factor labels after processing either ASU factor analyses or Russ’ MIRT?
In this section we label the measures based on the procedure behind the creation of a variable. This moves further away from the literature and thinking about our measures in terms of specific tasks, surveys or cognitive processes and is a more novel way of presenting our data.
Potentially prescriptive recommendations coming out of this section would intend to build intuitions about different treatments to data that have higher reliabilities especially when researchers plan on making their own tasks and collecting different kinds of measures.
tmp = measure_labels %>%
mutate(dv = as.character(dv)) %>%
filter(task == 'task') %>%
select(-measure_description, -notes, -measure_type, -task) %>%
left_join(rel_df[,c("dv", "spearman","icc")], by='dv') %>%
separate(variable_type, c("label_1", "label_2", "label_3"), sep = ",") %>%
mutate(label_1 = trim(label_1),
label_2=trim(label_2),
label_3=trim(label_3))%>%
filter(ddm_task == 1) %>%
separate(dv, c('task_name', 'extra_1', 'extra_2'), sep = '\\.', remove=FALSE) %>%
select(-extra_1, -extra_2) %>%
drop_na()
In addition to raw response time and accuracy measures we get from certain tasks we have also chosen to fit drift diffusion models to them. The tasks we have chosen to fit these models are: unique(tmp$task_name). This list does not include all cognitive tasks in the battery.
Table of median reliabilities for the two kinds of DDM models’ (hddm and EZ) parameters compared to raw response times and accuracies depending on whether they were fit to all data or whether they are contrast of two conditions sorted by mean ICC and mean Spearman’s \(\rho\).
tmp %>%
group_by(label_1,label_2, label_3) %>%
summarise(median_icc = median(icc),
median_spearman = median(spearman),
n_vars = n()) %>%
arrange(-median_icc, -median_spearman) %>%
datatable() %>%
formatRound(columns=c('median_spearman', 'median_icc'), digits=3)
tmp %>%
group_by(label_1,label_2, label_3) %>%
summarise(mean_icc = mean(icc),
mean_spearman = mean(spearman),
n_vars = n()) %>%
arrange(-mean_icc, -mean_spearman) %>%
datatable() %>%
formatRound(columns=c('mean_spearman', 'mean_icc'), digits=3)
Are these differences meaningful?
summary(lm(icc ~ label_1,tmp))
Call:
lm(formula = icc ~ label_1, data = tmp)
Residuals:
Min 1Q Median 3Q Max
-0.9249 -0.1081 0.0111 0.1152 0.4081
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2308 0.0192 12 <2e-16 ***
label_1overall 0.3917 0.0261 15 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.187 on 204 degrees of freedom
Multiple R-squared: 0.525, Adjusted R-squared: 0.522
F-statistic: 225 on 1 and 204 DF, p-value: <2e-16
In the model below (after the model comparison the interactive one is significantly better) the baseline is difference accuracy.
Main effects: Compared to this overall accuracy has higher reliability and difference threshold has lower reliability. Other difference parameters do not differ from difference accuracy significantly.
Interactions: Drift rates, thresholds are higher for overall than difference; response times are marginally so.
Conclusion: Overall parameter estimates are higher than difference estimates (except for non-decision time).
summary(lm(icc ~ label_1*label_3,tmp))
Call:
lm(formula = icc ~ label_1 * label_3, data = tmp)
Residuals:
Min 1Q Median 3Q Max
-0.8859 -0.1012 0.0123 0.0995 0.4507
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3260 0.0554 5.89 1.7e-08
label_1overall 0.2574 0.0750 3.43 0.00072
label_3drift rate -0.0555 0.0634 -0.88 0.38231
label_3non-decision -0.0975 0.0706 -1.38 0.16879
label_3rt -0.0768 0.0673 -1.14 0.25511
label_3threshold -0.2562 0.0706 -3.63 0.00036
label_1overall:label_3drift rate 0.1841 0.0876 2.10 0.03683
label_1overall:label_3non-decision 0.0493 0.0929 0.53 0.59625
label_1overall:label_3rt 0.1875 0.0955 1.96 0.05101
label_1overall:label_3threshold 0.2712 0.0929 2.92 0.00391
(Intercept) ***
label_1overall ***
label_3drift rate
label_3non-decision
label_3rt
label_3threshold ***
label_1overall:label_3drift rate *
label_1overall:label_3non-decision
label_1overall:label_3rt .
label_1overall:label_3threshold **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.175 on 196 degrees of freedom
Multiple R-squared: 0.599, Adjusted R-squared: 0.58
F-statistic: 32.5 on 9 and 196 DF, p-value: <2e-16
Closer look at simple effects:
Are overall drift rates meaninfully higher than overall raw accuracies? Yes.
tmp_simp = tmp %>%
filter(label_3 %in% c("drift rate", "accuracy"),
label_1 == 'overall')
summary(lm(icc~label_3, tmp_simp))
Call:
lm(formula = icc ~ label_3, data = tmp_simp)
Residuals:
Min 1Q Median 3Q Max
-0.8859 -0.0283 0.0313 0.0771 0.2195
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5834 0.0495 11.79 2.9e-14 ***
label_3drift rate 0.1286 0.0591 2.17 0.036 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.171 on 38 degrees of freedom
Multiple R-squared: 0.111, Adjusted R-squared: 0.0872
F-statistic: 4.72 on 1 and 38 DF, p-value: 0.036
Are overall drift rates meaninfully higher than overall raw response times? No, not worse either.
tmp_simp = tmp %>%
filter(label_3 %in% c("drift rate", "rt"),
label_1 == 'overall')
summary(lm(icc~label_3, tmp_simp))
Call:
lm(formula = icc ~ label_3, data = tmp_simp)
Residuals:
Min 1Q Median 3Q Max
-0.2610 -0.0304 0.0147 0.0486 0.1471
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.7120 0.0162 43.96 <2e-16 ***
label_3rt -0.0178 0.0274 -0.65 0.52
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.0857 on 41 degrees of freedom
Multiple R-squared: 0.0102, Adjusted R-squared: -0.0139
F-statistic: 0.422 on 1 and 41 DF, p-value: 0.519
Are overall thresholds meaninfully higher than overall raw accuracies? No, not worse either.
tmp_simp = tmp %>%
filter(label_3 %in% c("threshold", "accuracy"),
label_1 == 'overall')
summary(lm(icc~label_3, tmp_simp))
Call:
lm(formula = icc ~ label_3, data = tmp_simp)
Residuals:
Min 1Q Median 3Q Max
-0.8859 -0.0658 0.0236 0.0921 0.2195
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5834 0.0518 11.26 1.1e-13 ***
label_3threshold 0.0151 0.0619 0.24 0.81
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.179 on 38 degrees of freedom
Multiple R-squared: 0.00155, Adjusted R-squared: -0.0247
F-statistic: 0.0592 on 1 and 38 DF, p-value: 0.809
Are overall thresholds meaninfully higher than overall raw response times? No, they are worse.
tmp_simp = tmp %>%
filter(label_3 %in% c("threshold", "rt"),
label_1 == 'overall')
summary(lm(icc~label_3, tmp_simp))
Call:
lm(formula = icc ~ label_3, data = tmp_simp)
Residuals:
Min 1Q Median 3Q Max
-0.26099 -0.06799 0.00746 0.06522 0.20265
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.6942 0.0257 27 <2e-16 ***
label_3threshold -0.0957 0.0319 -3 0.0046 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.0997 on 41 degrees of freedom
Multiple R-squared: 0.18, Adjusted R-squared: 0.16
F-statistic: 9 on 1 and 41 DF, p-value: 0.00459
Are overall non-decision times meaninfully higher than overall raw accuracies? No, not worse either.
tmp_simp = tmp %>%
filter(label_3 %in% c("non-decision", "accuracy"),
label_1 == 'overall')
summary(lm(icc~label_3, tmp_simp))
Call:
lm(formula = icc ~ label_3, data = tmp_simp)
Residuals:
Min 1Q Median 3Q Max
-0.8859 -0.1202 0.0566 0.1102 0.2401
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5834 0.0571 10.22 1.9e-12 ***
label_3non-decision -0.0482 0.0682 -0.71 0.48
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.198 on 38 degrees of freedom
Multiple R-squared: 0.013, Adjusted R-squared: -0.013
F-statistic: 0.499 on 1 and 38 DF, p-value: 0.484
Are overall non-decision times meaninfully higher than overall raw response times? No, they are worse.
tmp_simp = tmp %>%
filter(label_3 %in% c("non-decision", "rt"),
label_1 == 'overall')
summary(lm(icc~label_3, tmp_simp))
Call:
lm(formula = icc ~ label_3, data = tmp_simp)
Residuals:
Min 1Q Median 3Q Max
-0.2610 -0.1100 0.0140 0.0947 0.2401
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5353 0.0242 22.15 < 2e-16 ***
label_3rt 0.1589 0.0409 3.88 0.00037 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.128 on 41 degrees of freedom
Multiple R-squared: 0.269, Adjusted R-squared: 0.251
F-statistic: 15.1 on 1 and 41 DF, p-value: 0.000366
Would this be a comparison of factor reliability versus item reliability?
Do people differ in how much their scores change depending on how many days it has been since they completed the initial battery?
Make data frame with difference between two scores for each measure for each subject. Since the scores for different measures are on different scales for comparability the difference scores are scaled (i.e. divided by their root mean square) but not centered so a value of 0 for the difference scores would indicate a lack of a difference between the scores of a subject.
t1_2_difference = data.frame()
for(i in 1:length(numeric_cols)){
tmp = match_t1_t2(numeric_cols[i],format='wide')
tmp = tmp %>%
# mutate(difference = scale(`2` - `1`))
mutate(difference = scale(`2` - `1`, center=F))
t1_2_difference = rbind(t1_2_difference, tmp)
}
t1_2_difference$difference = as.data.frame(t1_2_difference$difference)$V1
Add completion dates to this data frame.
t1_2_difference = merge(t1_2_difference, comp_dates[,c('sub_id', 'days_btw')], by='sub_id')
The effect of number of days in between in the full model where the difference scores are regressed on a fixed effect for measure and days between the two scores and random intercepts for each subject (Should this model include the interaction between the fixed effects?). Since there are over 300 measures the output of the full model is not presented here. Instead below are the coefficient for the fixed effect of days between completion times and its t value.
mod = lmer(difference ~ dv + days_btw + (1|sub_id), data = t1_2_difference)
data.frame(estimate=coef(summary(mod))["days_btw","Estimate"], tval=coef(summary(mod))["days_btw","t value"])
For visualization purposes I summarized the difference scores per person by looking at the average difference and plot that against the number of days between completion.
tmp = t1_2_difference %>%
group_by(sub_id) %>%
summarise(mean_diff = mean(difference, na.rm=T),
days_btw = unique(days_btw))
ggplotly(tmp %>%
ggplot(aes(days_btw, mean_diff))+
geom_point()+
theme_bw()+
xlab("Days between completion")+
ylab("Mean standardized difference between two time points")+
geom_smooth(method="lm"))
To confirm: the slope of this line is not significant. That is, there doesn’t seem to be a systematic difference between the two measurements depending on the number of days between the two measurements.
summary(lm(mean_diff ~ days_btw, data=tmp))
Call:
lm(formula = mean_diff ~ days_btw, data = tmp)
Residuals:
Min 1Q Median 3Q Max
-0.24670 -0.05189 0.00176 0.05104 0.22911
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.066878 0.027067 -2.47 0.015 *
days_btw 0.000389 0.000229 1.70 0.091 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.0824 on 148 degrees of freedom
Multiple R-squared: 0.0192, Adjusted R-squared: 0.0126
F-statistic: 2.89 on 1 and 148 DF, p-value: 0.091
For reference here are the box plots showing the distribution of difference scores for each task.
t1_2_difference %>%
ggplot(aes(reorder(dv, difference, FUN=median), difference))+
geom_hline(yintercept = 0,color='red',size=1)+
geom_boxplot()+
theme_bw()+
xlab("")+
ylab("Standardized difference")+
coord_flip()
raw_retest_subs_test_data = raw_test_data[raw_test_data$sub_id %in% raw_retest_data$sub_id,]
raw_numeric_cols = c()
for(i in 1:length(names(raw_test_data))){
if(is.numeric(raw_test_data[,i])){
raw_numeric_cols <- c(raw_numeric_cols, names(raw_test_data)[i])
}
}
raw_rel_df <- data.frame(spearman = rep(NA, length(raw_numeric_cols)),
icc = rep(NA, length(raw_numeric_cols)),
eta_sq = rep(NA, length(raw_numeric_cols)),
sem = rep(NA, length(raw_numeric_cols)))
row.names(raw_rel_df) <- raw_numeric_cols
for(i in 1:length(raw_numeric_cols)){
raw_rel_df[raw_numeric_cols[i], 'spearman'] <- get_spearman(raw_numeric_cols[i], t1_df = raw_retest_subs_test_data, t2_df = raw_retest_data)
raw_rel_df[raw_numeric_cols[i], 'icc'] <- get_icc(raw_numeric_cols[i], t1_df = raw_retest_subs_test_data, t2_df = raw_retest_data)
raw_rel_df[raw_numeric_cols[i], 'eta_sq'] <- get_eta(raw_numeric_cols[i], t1_df = raw_retest_subs_test_data, t2_df = raw_retest_data)
raw_rel_df[raw_numeric_cols[i], 'sem'] <- get_sem(raw_numeric_cols[i], t1_df = raw_retest_subs_test_data, t2_df = raw_retest_data)
}
raw_rel_df$dv = row.names(raw_rel_df)
row.names(raw_rel_df) = seq(1:nrow(raw_rel_df))
raw_rel_df$task = 'task'
raw_rel_df[grep('survey', raw_rel_df$dv), 'task'] = 'survey'
raw_rel_df = raw_rel_df %>%
select(dv, task, spearman, icc, eta_sq, sem)
raw_rel_df %>%
group_by(task) %>%
summarise(median_spearman = median(spearman),
median_icc = median(icc),
median_eta_sq = median(eta_sq),
median_sem = median(sem),
num_vars = n()) %>%
datatable() %>%
formatRound(columns=c('median_spearman', 'median_icc', 'median_eta_sq', 'median_sem'), digits=3)
Plots of change in reliability depending on transformation
ggplotly(rel_df %>%
mutate(dv = gsub(".logTr", "", dv)) %>%
mutate(dv = gsub(".Re", "", dv)) %>%
left_join(raw_rel_df, by = 'dv') %>%
ggplot(aes(spearman.x, spearman.y, color = task.x, label=dv))+
geom_point()+
geom_abline(intercept = 0, slope=1)+
theme_bw()+
theme(legend.title = element_blank())+
xlab("Spearman with transformation")+
ylab("Spearman witouth transformation"))
ggplotly(rel_df %>%
mutate(dv = gsub(".logTr", "", dv)) %>%
mutate(dv = gsub(".Re", "", dv)) %>%
left_join(raw_rel_df, by = 'dv') %>%
ggplot(aes(icc.x, icc.y, color = task.x, label=dv))+
geom_point()+
geom_abline(intercept = 0, slope=1)+
theme_bw()+
theme(legend.title = element_blank())+
xlab("ICC with transformation")+
ylab("ICC without transformation"))
Summarized bootstrapped reliabilities
boot_df <- read.csv('/Users/zeynepenkavi/Dropbox/PoldrackLab/SRO_Retest_Analyses/input/bootstrap_merged.csv')
boot_df = boot_df %>%
drop_na() %>%
mutate(icc = as.numeric(as.character(icc)),
spearman = as.numeric(as.character(spearman)),
eta_sq = as.numeric(as.character(eta_sq)),
sem = as.numeric(as.character(sem)))
boot_df %>%
group_by(dv) %>%
summarise(icc_median = quantile(icc, probs = 0.5),
icc_2.5 = quantile(icc, probs = 0.025),
icc_97.5 = quantile(icc, probs = 0.975),
spearman_median = quantile(spearman, probs = 0.5),
spearman_2.5 = quantile(spearman, probs = 0.025),
spearman_97.5 = quantile(spearman, probs = 0.975)) %>%
datatable() %>%
formatRound(columns=c('icc_median', 'icc_2.5', 'icc_97.5', 'spearman_median', 'spearman_2.5', 'spearman_97.5'), digits=3)
Distributions of bootstrapped reliabilities. Yellow points are where our point estimates fall from the retest sample. Surveys are filled in red but can’t be seen because the variance around their bootstrapped ICC’s is very low.
boot_df %>%
mutate(task = ifelse(grepl("survey",dv), "survey","task")) %>%
ggplot(aes(reorder(dv, icc, FUN=var), icc, fill = task))+
geom_boxplot()+
geom_hline(yintercept = 0,color='red',size=1)+
geom_point(data = rel_df, aes(x=dv, y = icc), color = "yellow")+
theme_bw()+
theme(legend.title = element_blank())+
xlab("")+
coord_flip()
Comparison of survey measures to cognitive task measures in the bootstrapped results. Multilevel model with random intercepts for each measure and fixed effect of survey versus cognitive measure. (Same result as seen in point estimates comparison).
boot_df = boot_df %>%
mutate(task = ifelse(grepl("survey",dv), "survey","task"))
summary(lmerTest::lmer(icc ~ task + (1|dv), boot_df))
Linear mixed model fit by REML t-tests use Satterthwaite approximations
to degrees of freedom [lmerMod]
Formula: icc ~ task + (1 | dv)
Data: boot_df
REML criterion at convergence: -733184
Scaled residuals:
Min 1Q Median 3Q Max
-11.499 -0.399 0.020 0.450 7.476
Random effects:
Groups Name Variance Std.Dev.
dv (Intercept) 0.05781 0.240
Residual 0.00689 0.083
Number of obs: 344000, groups: dv, 344
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.8657 0.0287 345.0000 30.1 <2e-16 ***
tasktask -0.4062 0.0322 345.0000 -12.6 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
tasktask -0.892
Checking DDM results in the bootstrapped estimates. Same as before too.
tmp = measure_labels %>%
mutate(dv = as.character(dv)) %>%
filter(task == 'task') %>%
select(-measure_description, -notes, -measure_type, -task) %>%
# left_join(rel_df[,c("dv", "spearman","icc")], by='dv') %>%
separate(variable_type, c("label_1", "label_2", "label_3"), sep = ",") %>%
mutate(label_1 = trim(label_1),
label_2=trim(label_2),
label_3=trim(label_3))%>%
filter(ddm_task == 1) %>%
separate(dv, c('task_name', 'extra_1', 'extra_2'), sep = '\\.', remove=FALSE) %>%
select(-extra_1, -extra_2) %>%
drop_na() %>%
left_join(boot_df[,c("dv", "icc")], by = 'dv')
summary(lmerTest::lmer(icc ~ label_1*label_3 + (1|dv) ,tmp))
Linear mixed model fit by REML t-tests use Satterthwaite approximations
to degrees of freedom [lmerMod]
Formula: icc ~ label_1 * label_3 + (1 | dv)
Data: tmp
REML criterion at convergence: -384875
Scaled residuals:
Min 1Q Median 3Q Max
-10.078 -0.479 0.035 0.519 5.634
Random effects:
Groups Name Variance Std.Dev.
dv (Intercept) 0.03079 0.1755
Residual 0.00896 0.0947
Number of obs: 206000, groups: dv, 206
Fixed effects:
Estimate Std. Error df t value
(Intercept) 0.3186 0.0555 196.9000 5.74
label_1overall 0.2614 0.0751 196.9000 3.48
label_3drift rate -0.0551 0.0636 196.9000 -0.87
label_3non-decision -0.0982 0.0707 196.9000 -1.39
label_3rt -0.0735 0.0674 196.9000 -1.09
label_3threshold -0.2547 0.0707 196.9000 -3.60
label_1overall:label_3drift rate 0.1850 0.0878 196.9000 2.11
label_1overall:label_3non-decision 0.0513 0.0931 196.9000 0.55
label_1overall:label_3rt 0.1880 0.0957 196.9000 1.96
label_1overall:label_3threshold 0.2701 0.0931 196.9000 2.90
Pr(>|t|)
(Intercept) 3.5e-08 ***
label_1overall 0.00062 ***
label_3drift rate 0.38739
label_3non-decision 0.16688
label_3rt 0.27727
label_3threshold 0.00040 ***
label_1overall:label_3drift rate 0.03641 *
label_1overall:label_3non-decision 0.58230
label_1overall:label_3rt 0.05095 .
label_1overall:label_3threshold 0.00415 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) lbl_1v lbl_3dr lbl_3- lbl_3rt lbl_3t l_1:_r
label_1vrll -0.739
lbl_3drftrt -0.873 0.645
lbl_3nn-dcs -0.784 0.579 0.685
label_3rt -0.823 0.608 0.718 0.646
lbl_3thrshl -0.784 0.579 0.685 0.615 0.646
lbl_1vr:_3r 0.632 -0.856 -0.724 -0.496 -0.520 -0.496
lbl_1vr:_3- 0.596 -0.807 -0.520 -0.760 -0.491 -0.468 0.691
lbl_1vrll:lbl_3r 0.580 -0.785 -0.506 -0.455 -0.704 -0.455 0.672
lbl_1vrll:lbl_3t 0.596 -0.807 -0.520 -0.468 -0.491 -0.760 0.691
l_1:_3- lbl_1vrll:lbl_3r
label_1vrll
lbl_3drftrt
lbl_3nn-dcs
label_3rt
lbl_3thrshl
lbl_1vr:_3r
lbl_1vr:_3-
lbl_1vrll:lbl_3r 0.633
lbl_1vrll:lbl_3t 0.651 0.633